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Abstract- Rainfall becomes a major factor in 
agricultural based country like India. Rainfall 
prediction has become one of the most 
systematically and technically taxing problems in 
around world. Farmers in Tamil Nadu are still 
following the agronomic activities based on 
astrological facts of Panchangam (Almanac). Yet 
there is very few ever attempted to see the 
rationality of the ancient knowledge system. 
Almanac also has a mathematical base for 
predicting the meteorological occurrences. During 
the study, the rainfall prediction by one of the 
traditional Almanac is studied in concentration for 
one cycle of 60 Tamil year’s corresponding to the 
Gregorian Year from 1957 to 2017. We have a lot of 
data mining techniques to extract information. In 
this work, we applied various classification 
algorithms such as SMO, Random Forest and 
REPTree on the almanac rainfall dataset in WEKA 
tool. This paper shows REPTree is best for 
prediction of rainfall using data mining techniques 
based on almanac. 

Keywords-Rainfall Prediction, Almanac, Prediction, 
Classification. 


1. INTRODUCTION 

Data mining is the search and analysis of large data 
sets, in order to discover meaningful patterns and rules. The 
key idea is to find effective ways to combine the computer’s 
power to process data with the human eye’s ability to detect 
patterns. Data mining techniques have been broadly applied 
almost in all fields to analysis the data for pattern the rules, 
classification, prediction, decision trees, fuzzy rules and so on. 

Rainfall is important for planning the activities of 
agriculturists, builders, water supply engineers, and all activity 
plans in the nature. India is an agricultural country and its 
economy is largely based upon crop productivity. Thus 
rainfall prediction becomes a significant factor in agricultural 
based countries like India. Rainfall Prediction is one of the 
most challenging tasks. Though already many algorithm have 
being proposed but still accurate prediction of rainfall is very 


difficult. In an agricultural country like India, the success or 
failure of the crops and water scarcity in any year is always 
viewed with greatest concern. 

Astronomy is an area where Data Mining has been 
playing a big role. Several techniques of Data mining have 
been used to solve tasks in Astrology. There has been 
increasing research interest in use of data mining techniques to 
scrutinize in the Astrology area. 

At present the Meteorology Department is informing 
only short term forecasting about weather but long term 
forecasting is needed for planning. This can be achieved by 
two methods namely traditional forecasting and scientific 
weather forecasting. Traditional forecasting is based on 
observations and experience using combinations of plants, 
animals, insects, meteorological and astronomical indicators, 
and almanacs or panchangs over a period of time. The 
scientific weather forecasting is based on past records of 
climate prevailed in the area using mathematical models. 

2. EXISTING APPROACH 

Kolluru Venkata Nagendra [1] surveys a range of 
classification techniques used by various researchers. 
Artificial Neural Network is applied for Rainfall Forecasting 
on various parameters are analyzed. They found that MLP 
method. Naive Bayesian classifiers and Support Vector 
Machines are best to predict rainfall compare to other 
techniques (Numerical & Statistical). They identify that for 
weekly, monthly and yearly rainfall forecasting Naive 
Bayesian , Feed Forward Neural Network and SVM gives best 
performance respectively. 

Dhawal Hirani [2] reports a detailed survey on 
rainfall prediction using different rainfall prediction methods 
extensively survey lasted 20 years. From the survey it has 
been found that most of the researchers used artificial neural 
networks for rainfall prediction and got significant results. 
They found that MLP, BPN, RBFN, SOM & SUM are 
suitable for predict rainfall forecasting techniques. 

Seema Mahajan [3] examined the relationship of 
Gujarat rainfall with significant universal parameters such as 
SLP, SST, U - Wind & Windstress and V- Wind & 
Windstress. They taken one month lagged (June - July) for 
40 years (From 1960 to 1999) data from National Oceanic and 
Atmospheric Administration (NOAA) and perform multilinear 
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regression on the generated and measured rainfall series. They 
found 0.8377 is the correlation coefficient between generated 
and measured rainfall series. 

B. Kavitha Rani [4] applied ANN to predict the 
summary rainfall data in Thailand. And found that back 
propagation gives accuracy result. Valmik B Nikam [5] 
extract the IMD (Indian Meteorological Department) Pune, 
weather data comprising of 36 attributes, only 7 attributes are 
relevant to rainfall prediction is taken and used Bayesian 
approach and got accuracy result. Jyothis Joseph [6] used 
clustering and classification techniques for prediction of 
rainfall and got the accuracy 87%. M. Kannan [7] used 
Multiple Linear Regression Model for rainfall prediction. 
They got approximate value not accurate value. Parneet Kaur 
[8] & M. Sivasakthi [9] found that Multi Layer Perception 
gives the best accuracy in EDM (Educational Data Mining) 
among Naive Bayes, SMO, J48 and REPTree classifications. 

N. Vivekanandan [10] [22] applies ANN based MLP 
and RBF for AER of Joshimath and Tohana rain-gauge 
stations. And found MLP for Joshimath and RBF for Tohana 
is sutiable for AER. A. Subasini [11] explore the applicability 
of data mining technique to predict the breast cancer. And 
analyzes the performance of C5.0, ID3, APRIORI, C4.5 and 
Naive Bayes algorithms. Experimental found C5.0 gives 
highest accuracy. Ozlem Terzi [12] used to estimate monthly 
rainfall values of Isparta. The monthly rainfall data of 
Senirkent, Uluborlu, Egirdir and Yalvac stations are taken. 
The best appropriate algorithm is multilinear regression & it 
gives relative error is 0.7%. M Ramzan Talib [13] collected 
weather data for 10 years from 2007 to 2016 at Faisalabab 
city, Pakistan and applied K-means clustering algorithm and 
Decision Tree algorithm for these data. Sarah N. Kohail [14] 
applied knowledge discovery process to take out knowledge 
from Gaza city weather data for 9 years from 1977 to 1985. 
Outlier analysis, prediction, classification, association and 
clustering data mining techniques applied. Harneet Kaur [15] 
studied an overview of different techniques and tools of data 
mining such as KNIME, Orange, RapidMiner, Tanagra and 
WEKA. And identified the challenges in health care domain. 
An implementation was shown in Tanagra tools for 
Classification and Visualization methods. Godfrey C. 
Onwubolu [16] used enhanced Group Method of Data 
Handling (e-GMDH) which uses the daily pressure & 
temperature and monthly rainfall and gives the good 
experimental results. Sweta [17] suggested to improving 
quality of service by properly managed security concerns. 

Divya Chauhan [18] reviewed the different 
algorithms and techniques in predicting various weather 
phenomenons like rainfall, thunderstorms and temperature. 
Then comparison is done between the techniques and found 
that decision trees and k-mean clustering gives the best 
results. Dhananjay P. Atole [19] computed the rainfall 
prediction for 5 India cities (Chennai, Delhi, Mumbai, Nagpur 
and Pune) using 7 years data of rainfalls daily, weekly, 
monthly. Accuracy result is got by using Multi variable 
polynomial regression (MPR) technique. Siddharth S. 
Bhatkande [20] used meteorological data from 2012 to 2015 
for various cities and Decision tree algorithm for classification 
of weather parameters such as minimum & maximum 


temperature of the data. They proved decision tree is best for 
weather prediction. 

Norraseth Chantasut [21] computed the rainfall 
prediction for monthly from historical rainfall data from 1941 
to 1999 from 245 rainfall monitor stations in Thailand around 
Chao Phraya River using ANN in which the number of 
training pattern is 372 and testing pattern is 96. The Neural 
Network gives 99.6% and 96.9% of accuracy of training and 
testing data respectively. R. Sukanya [23] compared several 
classification algorithms CART, C4.5, ID3, Back propagation 
and SVM. Finally concludes that nowadays almost researches 
using hybrid method for getting more accuracy results. 

D Angchok [24] predict rainfall by Tibetan astrological 
theories with meteorological predictions was accepted. They 
suggested as very few scientific studies have ever been 
conducted in ancient Astro-science and almost all of them 
have reported encouraging and positive outputs, there seems 
to have enormous scope lying in studying ancient sciences, 
especially Astro - disciplinary approaches. S Sivaprakasam 
[25] suggested the traditional methods of forecasting rainfall 
may be riddled with in accuracies but they cannot be ignored 
altogether. R. Raja [26] analysis 90 years (1909-1999) 
historical annual rainfall data of Coimbatore correlation with a 
particular Tamil year cycle with fourth coming Tamil cycle 
years. Pankaj S. Kulkarni [27] deals with converting ancient 
principles related to astrology into predictions using data 
mining techniques. Neelam Chaplot [28] taken total 102 
records, an half of the records were of persons are doctor and 
other half records of are not doctor by Profession,. They 
compared various Supervised classification techniques such as 
Logistic, Naive Bayes, Simple Cart, Decision Stump, Decision 
Table and DTNB algorithm. The better results were produced 
by simple logistic with 12 fold cross validation with an 
accuracy of 54.902%. Decision Stump algorithm with 14 fold 
classification gave results with an accuracy of 50%. S. R. 
Gedam [29] analyzed five data mining algorithms such as 
Bayesian, Decision table, Multilayer Perception, Random 
Decision Tree, and Random Forest. And got the accuracy 
result 84.7458%, 98.3051%, 94.9153,99.0202 and 100% 
respectively. Concluded that Random Forest Method is the 
best classification method. Rahul Shajan [30] analyzed the 
several data mining algorithms to correctly classify and 
predict health of a human being. They got the results 81.25 % 
accuracy for J48, J48 graft and Naive Bayes algorithms and 
93.75% accuracy for Random forest algorithm. 


3. PROPOSED APPROACH 

Experimental research methodology has been 
adopted for this work. Through the extensive search of 
literature and discussion with exports, the number of attributes 
which influencing the rainfall has been finalized. For this 
work, data are collected from particular almanac or panchang. 
This data is then filtered out using manual techniques. Then 
data is transformed into a standard format required by the 
WEKA tool. 
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A record of 60 years data from year 1957 to year 
2017 from almanac has taken for analysis. From an panchang 
or almanac we have considered five influencing attributes for 
rainfall are King, Minister, Megathipathi, Megam (Cloud 
Type), Rainfall value (Marakkal) each of which has sub item 
sets, which as shown in Table I. 


4. RESULT & DISCUSSION 

TABLE II. PERFORMANCE OF VARIOUS 
CLASSIFICATION ALGORITHMS 



SMO 

Random 

Forest 

REPTree 

Precisi 

on 

Recall 

Precis 

ion 

Reca 

11 

Precisio 

n 

Recall 

Kuruni 

0.875 

0.933 

0.929 

0.867 

0.933 

0.933 

Pathaku 

1 

0.857 

1 

0.857 

1 

0.857 

Mukkuruni 

0.947 

0.947 

0.900 

0.947 

0.947 

0.947 

Thooni 

0.947 

0.947 

0.950 

1 

0.950 

1 

Weighted 

Average 

0.935 

0.933 

0.935 

0.933 

0.951 

0.950 


We got the results by tested and analyze with three 
data mining classification algorithms such as SMO, 
RandomForest and REPTree that shows in above Table II. 
The correct accuracy of all the algorithms is given below in 
Table III. 


TABLE I. ATTRIBUTES INFLUENCING FOR 
RAINFALL 


S. No. 

Attribute 

Description 

Domain Value 

1 

King 

Ruling 

Planet of the 
year 

{Sun, Moon, 

Mars, Mercury, 
Jupiter, Venus, 
Saturn} 

2 

Minister 

Minister 

Planet of the 
year 

{Sun, Moon, 

Mars, Mercury, 
Jupiter, Venus, 
Saturn} 

3 

Megathipathi 

Planet 
supporting 
rainfall for 
the year 

{Sun, Moon, 

Mars, Mercury, 
Jupiter, Venus, 
Saturn) 

4 

Megam(Clou 
d Type) 

Type/Format 
ion of the 
Cloud 

{Aavarta, 

Samvarta, 

Pushkara, Drona, 
Kaala, Neela, 
Varuna, Vayu, 
Dhamo} 

5 

Almanac 

Rainfall 

Rainfall of 
the year as 
per Almanac 

{ Kuruni, 

Pathaku, 

Mukkuruni, 

Thooni } 


In this work various data mining techniques are used to 
predict rainfall. WEKA is used to apply the classification 
techniques and for predictions. The output has been analyzed 
with three classification algorithms such as SMO, 
RandomForest and REPTree. 


TABLE III. ACCURACY OF CLASSIFICATION 
ALGORITHMS 


Data Mining Algorithm 

Accuracy (%) 

SMO 

93.33 

RandomForest 

93.33 

REPTree 

95 


Both SMO and RandomForest algorithm got 
accuracy 93.33%. The best accuracy is 95% performed by 
REPTree algorithms. The following chart shows the 
performance accuracy of algorithm. 

Fig 2. Comparison of Accuracy 


Accuracy of Algorithms 
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5. CONCLUSION 

In this paper. Rainfall predicting attitudes and data 
sets are taken for cycle of 60 Tamil year’s related to the 
Gregorian Year from 1957 to 2017 from Almanac. For this 
data set three data mining classification algorithms such as 
SMO, RandomForest and REPTree was applied using in 
WEKA tool. We get more accuracy result 95% for REPTree 
classification data mining algorithm. So, the existing REPTree 
algorithm is sufficient to find the similar patterns in the 
almanac rainfall predictions. 
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